-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary calls to CodecPool.returnCompressor/returnDecompresso... #103
Conversation
…ssor to avoid race conditions The input/output stream implementations erroneously add the (de)compressors back to the CodecPool on close. The user who creates the (de)compressor is responsibile for doing this, and if they return a decompressor as well, you will have the same instance in the pool twice.
Thanks for the PR @themodernlife. Unfortunately, however, I'm afraid that the problem is more complicated. Most use cases go through LzopCodec to create the input stream. However, LzopCodec itself has two conflicting ways of managing the decompressor instances. For example, one can call On the other hand, LzopCodec also has So if we removed the call to return the decompressor within |
When LzopCodec creates the decompressor, it could return a filtered inputstream that returns the decompressor when the stream is closed. That is probably all the that is required. |
That sounds like a good approach. @themodernlife, would you like to update your PR to do that, both for the compressor and the decompressor? |
PR updated. Good idea @rangadi! |
} | ||
|
||
@Override | ||
public int read() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should implement read(byte b[], int off, int len)
, otherwise, reads will be very slow.
actually even better is to make it extend FilterInputStream and override only close().
Same for OutputStream.
import java.io.IOException; | ||
import java.io.InputStream; | ||
import java.io.OutputStream; | ||
import java.io.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : I don't think hadoop code base encourage * import.
+1. Thanks for the updates. |
LGTM. Thanks @themodernlife for your contribution! I'll merge it shortly. |
Remove unnecessary calls to CodecPool.returnCompressor/returnDecompresso...
...r to avoid race conditions
The input/output stream implementations erroneously add the (de)compressors back to the
CodecPool
on close, even though they didn't get the (de)compressors from the pool. The user who creates the (de)compressor is responsibile for doing this and if they both return the decompressor, you will end up with the same instance in the pool twice which leads to a race condition.This fixes #91 and #94.
There was some concern that this might break some code in the wild.
FWIW, I did a quick search on GitHub to see how people are using this library, and there really wasn't much to speak of outside of forks/hadoop code. The code I did find properly uses
CodecPool
(getting and returning) so this patch wouldn't be an issue. This patch also should work cleanly with any Hadoop setups.The only way I can see that a user could run into a problem is if they get the decompressor/compressor from the
CodecPool
and then don't return it, in which case they are really usingCodecPool
wrong, which I would hope is not common enough to justify keeping this fix out.My main motivation is that this makes it possible to use Spark safely with LZO (see #91).
Hope you guys can incorporate it one way or another! Maybe a 0.5.x release (to clearly signal any potential change of behavior)?